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Abstract 

There are many applications of graph cuts in computer vision, e.g. segmenta- 
tion. We present a novel method to reformulate the NP-hard, k-way graph parti- 
tioning problem as an approximate minimal s — t graph cut problem, for which a 
globally optimal solution is found in polynomial time. Each non-terminal vertex in 
the original graph is replaced by a set of ceil{log2{k)) new vertices. The original 
graph edges are replaced by new edges connecting the new vertices to each other 
and to only two, source s and sink t, terminal nodes. The weights of the new edges 
are obtained using a novel least squares solution approximating the constraints of 
the initial k-way setup. The minimal s — t cut labels each new vertex with a binary 
(s vs t) "Gray" encoding, which is then decoded into a decimal label number that 
assigns each of the original vertices to one of k classes. We analyze the properties 
of the approximation and present quantitative as well as qualitative segmentation 
results. 

Keywords: graph cuts, graph partition, multi-way cut, s — t cut, max-flow min- 
cut, binary. Gray code, least squares, pseudoinverse, image segmentation. 

1 Introduction 

Many computer vision problems can be formulated as graph labeling problems, e.g. 
segmentation [ ], denoising [ ], registration [33, 42], point correspondence from stereo 
pairs [. ], and shape matching [ o]. The general problem is to assign one label, out of a 
finite set of labels, to each vertex of the graph in some optimal and meaningful way. In 
some cases, this may be formulated as a graph cut problem, where the task is to separate 
the vertices into a number of groups with a common label assigned to all vertices in 
each group. 
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1.1 Graph cuts without label interaction 

The multi-way or ^-way cut problem is defined as follows. Given a graph G{V^E) with 
vertices vj G V and edges ey.^yj = eij e E CV xV with positive edge weights w{eij) 
or Wij, Wij = Wji (i.e. undirected graph), find an optimal ^-way cut (or ^-cut) C* C £" 
with minimal cost |C*| = argmmc\C\, where |C| = Y^eijec^ij^ such that restricting the 
edges to E\C breaks the graph into k disconnected graphs dividing the vertices into k 
groups, with all vertices in one group sharing one of the labels ^ = {/o,/i, 
We denote the number of vertices and edges in the graph as |y | (/.^. V = {vi , • • • , V|y| }) 
and l^"!, respectively. 

This ^-cut formulation assumes that all the semantics about the computer vision 
problem at hand can be encoded into the graph edge weights. For this class of cut 
problems the discrete optimization and graph theory communities has made notable 
progress. For k = 2, the problem reduces to partitioning V into 2 sets (binary label- 
ing). The min-cut max-flow theorem equates the minimal cut in this case (k = 2) to the 
maximal network flow from one of the terminal vertices, the source s, to the other, the 
sink t. For ^ > 3, the general multi-way cut problem is known to be NP-hard. How- 
ever, for the special case of planar graphs, a solution can be found in polynomial time 
[^j, IV.]. Finding the minimal cut that separates the source from the sink for ^ = 2 can 
always be found in polynomial-time, e.g. using the Ford-Fulkerson [ ], Edmunds- 
Karp [14], or Goldberg-Tarjan [ ], even for non planar graphs. When k = 2 and there 
are constraints on the size of the partitioned sets then the problem is NP-hard. Sev- 
eral non-global solutions have been proposed to the multi-way cut (^ > 3) problem. 
Dahlhaus et al. proposed an algorithm based on the union of individual 2-way cuts that 
is guaranteed a cut cost with a factor of 2 — 2/^ of the optimal cut cost [ ]. Through 
a linear programming relaxation algorithm, Calinescu et al. improve on the approxi- 
mation ratio down to at most 1.5 — 2/^[ ]. For ^ = 3, the 7/6(= 1.5 — 2/3) factor 
of [12] is further improved by Karger et al. to 12/11 using a geometric embedding 
approach [? ]. Zhao et al. approximate the ^-way cut via a set of minimum 3-way cuts 
with an approximation ratios of 2 — 3/^ for an odd k and 2 — (3^ — 4)/ {k^ — k) for an 
even k{ ]. In [ ], Goldschmidt and Hochbaum show that the NP-completeness of 
the specified vertices problem does not imply NP-completeness of the ^-cut problem 
without specified fixed terminal vertices. 

1.2 IMarkov random fields 

In computer vision applications, it is sometimes difficult, if not impossible, to encode 
the semantics of the computer vision problem into the topology and edge weights of 
a graph, and proceed by applying a graph cut method. On the contrary, it is often de- 
sirable that the label assignment to a vertex in the graph be influenced by the labels 
assigned to other vertices. This means that the labeling resulting from the cut has an 
impact on the cost of the cut itself {i.e. does not depend only on the edge weights sev- 
ered). In Markov random fields (MRF) [ ], modeling a vertex label's dependence on 
the labels of all other vertices is captured by the dependence only on the labels of the 
immediate neighbors. In image segmentation, for example, each pixel is represented 
by a vertex in a graph and the graph edges capture the neighborhood relationship be- 
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tween pixels (e.g. via 4- or 8 -connectivity in 2D). It is usually desirable that the graph 
partitioning and, hence, the induced labeling satisfy two possibly conflicting criteria: 
(i) a pixel is labelled according to the data (e.g. the image) value at that particular ver- 
tex (e.g. pixel) only; and (ii) neighboring vertices (e.g. pixels) are assigned identical 
or similar labels. In general, this latter criterion regularizes the solution and makes it 
more robust to noise. 

MRF theory can be adopted to formulate this desired behavior as an objective func- 
tion to be minimized, using discrete optimization methods, with respect to the different 
possible vertex labels over the whole graph. Given a graph G(V^E) the MRF energy 
function can be written as 

^(/)= ^ A-a-)+^ L VijihJjA.dj) (1) 

viev {vi,vj)eE 

where Vi and vj are vertices (e.g. corresponding to two pixels p and q) with data values 
di and dj (e.g. the image values I(p) and I{q)) and with labels If and Ij, respectively. 
Di(li) is the data term (or image fidelity) measuring the penalty of labeling v/ with a 
specific label disregarding the labels or data values of any of the other (neighbors 
or elsewhere) vertices, and Vij is the vertex interaction term that penalizes certain label 
configurations of neighboring vertices Vi and vj, i.e. the penalty of assigning label U to 
Vi and Ij to v^-. A controls the relative importance of the two terms. Vij can be seen as 
a metric on the space of labels Vij = Vij (/; , Ij) (also called the prior) or may be chosen 
to depend on the underlying data Vij = Vij (di^dj), or both Vij (lijj^di^dj), e.g. 

Vij(ii,ijAA) = y!Mi^j)yfj{di,dj) (2) 

where superscripts / and d denote label and data interaction penalties, respectively. 
Various label interaction penalties have been proposed, including linear: V-j = \li — lj\, 

quadratic penalty: (k — IjY, truncated versions thereof: min { T, | // — lj\} or min | T, (U — Ij)^ 
with threshold T, or Kronecker delta 5/.^/^. [ ]. Various spatially- varying penalties de- 
pending on the underlying image data have also been proposed, e.g. Gaussian Vfj = 

^-^(di-dj) or reciprocal [ ]. 

\+^(di-djf 

1.3 MRF for computer vision 

In computer vision applications, e.g. segmenting an image into k regions, an image 
with P pixels is typically modeled as a graph with P vertices, one for each pixel (i.e. 
each vertex is mapped to a location in Z^, where d is the dimensionality of the image). 
To encode the data term penalty Di(li), the graph is typically augmented with k new 
terminal vertices {^j }^^^; each representing one of the k labels (Figure 2a). The edge 
weight connecting a non-terminal vertex Vi (representing pixel p) to a terminal vertex 
tj (representing label Ij) is set inversely proportional to Di(lj)\ the higher the penalty 
of labeling v/ with Ij the smaller the edge weight and hence the more likely the edge 
will be severed, i.e. 

w.^^tj - l/Di(lj)'yvi e vytj e {tj})^,. (3) 
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Setting the edge weights according to any vertex interaction penalty V^j Ij^di^dj) 
is not straightforward. In the special case when Vij{liJj^di,dj) = Vfj{di^dj), i.e. inde- 
pendent of the labels li and Ij, then typically Vij is encoded through the graph edge 
weights connecting vertices representing neighboring pixels. The higher the edge 
weight the less likely the edge will be severed and hence the more likely the two pixels 
will be assigned the same label. This motivates setting the edge weights proportional 
toVfjJ.e. 

Wy.^yj oc Vij {dudj) -yeij e E. (4) 

This approach, however, discourages (or encourages) cutting the edge between neigh- 
boring vertices and hence assigning the same (or different) labels to the vertices with- 
out any regard to what these same (or different) labels are . Clearly, (4) is not flexi- 
ble enough to encode more elaborate label interactions (since essentially Vfj{liJj) = 
constant). In fact, this issue is at the heart of the challenging multi-label MRF opti- 
mization problem: developing globally (or close to global) optimal algorithms for any 
interaction penalty. 

Greig et al. presented one of the earliest works (1989) on combinatorial optimiza- 
tion approaches to a computer vision problem [ ] . They constructed a two-terminal 
graph, whose minimal cut gives a globally optimal binary vector used for restoring bi- 
nary images. In earlier works, iterative algorithms, such as simulated annealing where 
employed to solve MRF problems. Later (1993), Wu and Leahy applied a graph the- 
oretic approach to data clustering for image segmentation [ , ]. The 1997 work of Shi 
and Malik on normalized cuts [39, 40] sparked large interest in graph-based image 
partitioning. In [ ], the cost of a partition is defined as the total edge weight con- 
necting two partitions as a fraction of the total edge connections to all the nodes in the 
graph, which is written as a Rayleigh quotient whose global minimum is obtained as 
the solution of a generalized eigen- system. In 1998, Roy and Cox [ ] re-formulated 
the multi-camera correspondence as a max-flow min-cut problem. Boykov and Jolly 
applied the min-cut max-flow graph cut algorithm to find the globally optimal binary 
segmentation [ ]. 

As mentioned earlier, in the multi-label problem (^ > 3) the global minima is gen- 
erally not attainable in polynomial time. In the special case of convex label interaction 
penalty, also known as convex prior, Ishikawa proposed a method that achieves the 
global energy minimizer [^^, Ishikawa' s convex prior condition is given by: 

2Vlj{li - Ij) > Vi'jih -/; + !) + Vfjih -Ij-l) (5) 

This convex definition was later generalized in [ ]. However, with convex priors, e.g. 
quadratic V-j = (li — lj)^, the penalty for assigning different labels to neighboring pix- 
els can become excessively large, which in turn over-smoothes the label field because 
several small changes in the label can yield a lower cost than a single sudden change. 
This encourages pixels at opposite sides of an interface between two different regions 
be assigned the same or similar labels albeit ideally they shouldn't. This motivates the 
introduction of non-convex priors, typically achieved by truncating the penalty (e.g. 
the truncated quadratic min{r, (/^ — Ij)^} or the Pott's model), to allow for discontinu- 
ities in the label field at the cost of no longer guaranteeing a globally optimal energy 
minimizer. 
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This tradeoff, either a guaranteed global minima of convex prior or a discontinuity- 
preserving prior whose global minima cannot be achieved, has sparked a strong interest 
within the computer vision community in improving the state-of-the-art of optimizing 
multi-label problems with non-convex priors. In their seminal work, Geman and Ge- 
man applied simulated annealing based optimization [ ]. In [ ], Iterated Conditional 
Modes was proposed. In [ ], to segment multiple regions, a recursive sub-optimal 
approach is used, which entails deciding if the current partition should be further sub- 
divided and repartitioning if necessary. In a somewhat reverse approach, Felzenszwalb 
and Huttenlochers algorithm assigns a different label to each vertex, then similar pixels 
are merged using a greedy decision approach [ ]. Boykov et al proposed two algo- 
rithms that rely on an initial labeling and an iterative application of binary graph cuts. 
At each iteration, an optimal range move is performed to either expand (a-expansion) 
or swap labels (a — j3-swap) [ , ]. Although convergence and error bounds are guar- 
anteed, the initial labeling may influence the result of the algorithm. In [ ], Veksler 
proposes a new type of range moves that act on a larger set of labels than those in [ ] . 
The LogCut [ ] is another iterative range move based algorithm that applies the binary 
graph cut at successive bit-levels of binary encodings of the integer labels (from most 
significant to least significant) rather than once for each possible value of the labels. 
In [27], the image is partitioned into two regions by computing a minimum cut with a 
swap move of binary labels and then the same procedure is recursively applied to each 
region to obtain new regions until some stopping condition is met. Recently, Szeliski 
et al presented a study comparing energy minimization methods for MRF[ ]. 

The aforementioned range move type of approaches are regarded as the state-of- 
the-art in solving multi-label assignment problems in the computer vision community. 
It is important to note, however, that the a — j3 swap algorithm can only be applied in 
the cases when Vjj is semi-metric [ ], i.e. satisfying both conditions 

^A(a,i3)=0<^a = i3 (6) 

\/A(a,i3)=4(i3,a)>0 (7) 

On the other hand, a-expansion is even more restricted and can only be applied when 
V-j is metric [ ], in addition to the two above conditions, the following triangular 
inequality must also hold 

V/A(a,jS)>l^A(a,7) + l^A(7,jS) (8) 

These label-interaction restrictions (convex, semi-metric, metric) limit the applica- 
tions of graph cuts algorithms, since the semantics of the computer vision problem can 
not always be easily formulated to abide by these restrictions. 

More recent approaches to solving the multi-label MRF optimization have been 
proposed based on linear programming relaxation using primal-dual [ ], message 
passing and partial optimality [ ]. 

1.4 Contributions 

In this work, we propose a novel method to convert the multi-label MRF optimization 
problem to a binary labeling of a new graph with a specific topology. The error in the 
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new edge weights is minimized using least- squares (LS). The resulting binary labeling 
is solved via a single application of s — t cut, i.e. the solution is non-iterative and does 
not depend on any initializations. Once the binary labeling is obtained, it is directly 
decoded back to the desired non-binary labeling. The method accommodates any label 
or data interaction penalties, i.e. any Vij {lijj^di^dj), e.g. non-convex or non-metric 
priors or spatially varying penalties. Further, besides its optimality features, LS en- 
ables offline pre-computation of pseudo-inverse matrices that can be used for different 
graphs. 

2 Method 

2.1 Reformulating multi-label MRF ass — t cut 

Given a graph G(y, £^)\the objective is to label each vertex Vi with a label G = 
{/o , /i , . . . , 4-1 } . The key idea of our method is: Rather than labeling Vi with li G we 
replace the vertex Vf with b vertices v^j, j G { 1 , 2, • • • , Z?}, and label each Vfj with a binary 
label lij e ^2 = {O7 !}• Assigning label li to vertex Vi in G{V^E) entails assigning a 
corresponding sequence of binary labels (//jO^^ito (v^j)^^^. We distinguish between 
the decimal (base 10) and binary (base 2) encoding of the labels using the notation (/; ) 10 
and {li)2 = {UiiUii" ' Jib) 2, respectively, with // G and lij G ^2- Consequently, the 
original graph G{V,E) is transformed into G2(V2,£^2), where subscript 2 denotes the 
binary representation (or binary encoding). V2 is given as 

v2={K-};.:i}'^^u{.,f}. (9) 

with \V2\ = b\V\ -\- 2, i.e. b vertices in V2 for each vertex in V and two terminal, source 
and sink, vertices {s^t}. b must be chosen large enough such that the binary labeling 
of the b vertices Vij can be decoded back into a label li G for v^ . In other words, b is 
the number of bits needed to encode k labels. Therefore, we must have 2^ > k, or 

b = ceil{log2{k)). (10) 

Each edge in E2 is either a terminal link (t-link), a neighborhood links (n-links), or 
an intra-link, i.e. 

E2 = Ef""^' U Ef""^' U E'^'''' . (11) 

where (Figure 1) 

£:f^^^ = 4u^|. (12) 

Ef^k'=EY\JEl^. (13) 

Edges in E2 and E2 connect each vertex in V2 to terminals t and s, respectively, there- 
fore I £"2 1 = I £"11 = IV2I. With b vertices in V2 replacing each vertex in V, up to b^ 
unique edges can connect vertices in V2 that correspond to neighboring vertices / and 

^ There are no terminal vertices in this original graphs. Nevertheless, we draw terminal vertices in Figure 
2(a,d) to clarify that in a multi-way cut each vertex is assigned a label. 
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j in V. E2^^^^ contain all these edges for all pairs of neighboring vertices, i.e. 
\^£nlinks^ _ /^2|^| distinguish between two types of n-links: £^2^ is limited to the 
sparse set of b edges that connect corresponding vertices {vim^vjm) (note same m in 
both vertices), i.e. |£^2^| = b\E\, whereas contains all the remaining edges that 
connect non-corresponding vertices {vim,Vjn),m 7^ n, i.e. 1^2^ | = {b^ — b)\E\. Ef^^^ 
includes edges that connect pairs of vertices (v/^, V;„) (note same /) in V2 that are among 
the set of vertices representing a single vertex in V, yielding \E2^^^\ = (2) 1^1- Formally, 

E^2 = {e.,t'yvijeV2} 
Ei = {ev,js'yvijeV2} 

E^' = {ei^jm-yeijeE} (14) 
Ef = {eimjn'yeij eE.m^n} 
Ei^'''' = {eimM'yvijeV2,m^n} 




Following 8in s — t cut on G2, vertices Vij that remain connected to s are assigned 
label 0, and the rest that are connected to t are assigned label 1 (we could swap and 
1 without loss of generality). The string of b binary labels lij G ^2 assigned to Vij are 
then decoded back into a decimal number indicating the label If G assigned to Vf 
(Figure 2). 

It is important to set the edge weights in E2 in such a way that decoding the binary 
labels resulting from the 5" — ^ cut of G2 will result in optimal (or as optimal as possi- 
ble) labels for the original multi-label problem. We do not expect to optimally solve 
the multi-label problem this way, but rather to provide an approximate solution. The 
second key idea of our method is: Derive a system of linear equations capturing the 
relation between the original multi-label MRF penalties and the s — t cut cost incurred 
when generating different label configurations, and then calculate the weights of E2 as 
the LS error solution to these equations. In the next sections, we show how we choose 
the edge weights of E2 in a minimum LS error formulation. 
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(a) (b) 




(c) (d) 



Figure 2: Reformulating the multi-label problem as a single s — t cut. (a) A multi-label 
problem (^-way cut) of labeling vertices {v^}^^^ with labels (only £2^^^^ are 

shown), (b) New graph with 2 terminal nodes {s^t}, b = 2 new vertices (vn and Vi2 
inside the dashed circles) replacing each Vi in (a), and 2 terminal edges for each Vij. (c) 
An ^ — ^ cut on (b). (d) Labeling Vi in (a) is based on the ^ — ^ cut in (c). (d) Pairs of 
i^n^^a) assigned to (s^s) are labeled with binary string 00, (^,0 with 01, (t^s) with 10, 
and (t^t) with 11. The binary encodings 00, 01, 10, or 11 in turn reflect the original 4 
labels. 



8 



2.2 Data term penalty: Severing t-link, intra-links 

In the proposed binary formulation, the data term penalty in (1) equals the cost 

of assigning label U to vertex Vi in GiV^E), which entails assigning a corresponding 
sequence of binary labels {lij)^j^iio {vij)^j^^ in G2(V2,£^2)- To assign {li)2 to a string of 
b vertices, appropriate terminal links must be cut. To assign a (resp. 1) label to Vfj the 
edge connecting Vij to the terminal t (resp. s) must be severed (Figure 3). Therefore, 
the local (corresponding only to labeling vO cost of severing t-links in G2 to assign 
to vertex Vi in G can be calculated as 

b 

Df^'^li) = ^ lijw,,j^s + lij^v,j,t (15) 
i=i 

where lij denotes the unary complement (NOT) of Ifj, Wy.j^s = "^{^vij.s) is the weight 
of the edge connecting Vfj to s, and, similarly, Wy.j^t — >^(^v/y,/), with e^^.^s ^ ^2 

The G2 s — t cut severing the t-links as per (15), will also result in severing edges 
in £2^^^ (Figure 1 and (14)). In particular, eimjn ^ £2^^^ will be severed iff the ^ — r 
cut leaves Vim connected to one terminal, say s (resp. t), while Vin remains connected to 
the other terminal t (resp. s) (Figure 3). The local cost of severing intra-links in G2 to 
assign // to vertex v/ in G can be calculated as 

iyf"{ld=t i {lim®lin)w.,^,,„ (16) 

m=l n=m-\-l 

where denotes binary XOR, which ensures adding the edge weight between Vim and 
Vin to the cut cost iff the cut results in one vertex connected to one terminal (s or t) 
while the vertex connected to the other terminal (t or s). 
The final data term penalty is the sum of (15) and (16), 

Di{li) = Df^'^'ik) +Z)f '"^(Z,-). (17) 




Figure 3: The 2^ ways to cut through {v/^}^^^ are shown for /? = 2 (left) and for /? = 3 
(right). Note that the severed t-links and intra-links for each case follow (15) and (16), 
respectively. 
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2.3 Prior term penalty: Severing n-links 

The vertex interaction penalty, Vij{liJj,di^dj) in (2), for assigning // to Vi and Ij to 
neighboring vj in G{V^E), i.e. etj G equals the cost for assigning a sequence of 
binary labels {limfm=i^o {vimf^^i and {ljn)n=\^^ (^m)Li ^^2(^2, £2). The local (cor- 
responding only to labeling Vi and v^ ) cost of this cut can be calculated as (Figure 4) 

b b 

Vij{k,lj,di,dj) = X X {km®ljn)Wy.^^vjn' (18) 
m=l n=l 

This effectively adds the edge weight between Vim and vjn to the cut cost iff the cut re- 
sults in one vertex of the edge connected to one terminal (s or t) while the other vertex 
connected to the other terminal (t or s). Note that we impose no restrictions on the left 
hand side of (18), e.g. it could reflect non-convex or non-metric priors, and can be spa- 
tially varying. Essentially, for every pair (/, j), Vij{liJj^di^dj) must only return a non- 
negative scalar. As special cases , Vij could be V-j ( k , /j ) , V-j {di ,dj)^ or V-j ( k , Ij ) Vfj {di ^dj). 

2.4 Edge weight approximation with least squares 

Equations (17) and (18) dictate the relationship between the penalties of the data and 
prior terms (Di and Vij) of the original multi-label problem (that of G{VjE)) and the 
severed edge weights of the binary ^ — ^ cut formulation (G(V2,£^2))- What remains 
missing before applying the ^ — ^ cut, however, is to find the edge weights for the 
binary problem, i.e. w{ey.j^y^^) = Wij^mn'yeij^mn ^ E2. 

2.4.1 Edge weights of t-links and intra-links 

For b = I (i.e. binary labelling), (16) simplifies to 

Df'%li)=0 (19) 

and (15) and (17) simplify to 

Di(li) = Df-^\li)^Q = liiwy^,^s^liiwy^,^f (20) 

With // = In for /? = 1, substituting the two possible values for li, U = Iq and li = /i, we 
obtain these two equations 



which can be written in matrix form A\X[= B\ as 



(21) 



(22) 



where X[ is the vector of unknown edge weights connecting vertex vn to s and t, B\ 
is the data term penalty for v^, and A\ is the matrix of coefficients. The subscript 1 in 
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Figure 4: Severing n-links between neighboring vertices Vp and Vq for = 2 (four such 
examples are shown in the left column) and b = 3 (3 examples in the right column). 
The cut is depicted as a red curve. In the last two examples for /? = 3, the colored 
vertices are translated while maintaining the n-links in order to clearly show that the 
severed n-links for each case follow (18). 
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Ai,Xj, and B\ indicate that this matrix equation is for the case of b ■ 
solution to (22) is trivial 



-Di{h) and Wy.^^t = Di{lo). 



1. Clearly, the 



(23) 



i.e. the higher the penalty of assigning /i to v^, the more costly it is to sever ey.^^s and 
hence the more likely it is to assign /q to v^, and vice versa. This agrees with what we 
expect in the binary case. The more interesting cases are when b> I. 

For Z? = 2, we address mutli-label problems with 2^~^ = 2<k<2^=4 labels, i.e. 
^ = 3 or ^ = 4. Substituting the 2^ = 4 possible label values, ((0,0),(0,1),(1,0), and 
(1,1)), of {li)2 = {hula) in (17) we obtain 

2 _ 2 2 

j=\ m=ln=m-\-l 



(0, 0) A-(/o) = Owv-i,, + IWy.^^t + + lwv.2,/ + Owv- 

(0, 1) Di{h) = Owy^^^s + Iwv^-i,? + ^^va^s^Owy^^^t + Iwv^- 



1 ,V/2 
1 ,V/2 

which can be written in matrix form A2X2 = B2 as 



(l,0)^A(/2) 

(0,0)^A-(/3) 



(25) 



/ 


1 

V 1 



1 
1 

0/ 



























V 




J 



( Diik) \ 

A(/i) 

Diik) 
V Diih) j 



(26) 



Similarly, for = 3 (A: = 5,6,7, or 8), we write 2'' = 8 equations to the linear system 
of equations = B3, where 
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Wv,2,., 


















Wv.i,v.3,Wv.2,v.3) 









(27) 



4 = (Diik), Diih), Diih), Diih))'. 
In general, for any b, we have 

AbXi=Bi 



(28) 
(29) 

(30) 
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where Xj^ is a column vector of length 2b + (2) 



is a 2^ X (2/? + (2) ) matrix whose 7th row A^(7, :) is 

Aij{dec(li\li2 '"kb),'') = {ki , ki , ki, ki,'" -> kb-ikb-, 

hi © hi, hi © fe, • • • 7 hi © ^i^, fe © fe, ®hb-,"' h,b-l ® ^i^) 



(31) 



(32) 



where J^c(.) is the decimal equivalent of its binary argument. is a 2^ -long column 
vector given by 

4 = {Di{lo),Di{h)Mh)r • • Mh-i)y ' (33) 

We can now solve the linear system of equations in (30) and find the optimal, in a 
LS sense, t-links and intra-links edge weights related to every vertex Vi using 

Xi=A+Bi (34) 

where A+ is the (Moore-Penrose) pseudoinverse of A calculated using singular value 
decomposition (SVD): If A = t/ZV* is the SVD of A then A+ = VZ+f/*, where the 
diagonal elements of Z+ are the reciprocal of each non-zero element of Z [ , 34, 35]. 

Solving (34) for every vertex v^, we obtain the weights of all edges in El^^^^^yjE^^^^^ • 
Note that A^ and, more importantly, A J are easily pre-computed off-line only once for 
each b value, as they do not change for different vertices or for different graphs. 



2.4.2 Edge weights of n-links 

For b= \ {i.e. binary labelling), (18) simplifies to 

{h®lj)wij = Vij{hJj,di,dj) (35) 

where Wy.^^y^.^ has been replaced by Wfj and hi and Iji have been replaced by h and Ij, 
since they are equivalent in the Z? = 1 case. 

In the case when the vertex interaction depends on the data only and is indepen- 
dent of the labels // and Ij, i.e. Vij{hJj,di^dj) = Vfj{di,dj), we can simply ignore 
the outcome of h © Ij and thus set it to a constant 1 /c, then the solution is trivial 
^ij = cVij{di.,dj), which agrees with (4). However, in the general case, when Vij de- 
pends on the labels // and Ij of the neighboring vertices Vi and Vj, a single edge weight 
is insufficient to capture such elaborate label interactions essentially because Wij needs 
to take on a different value for every pair of labels. 

To address this problem, we substitute in (18) each of the l''!'' = 2^'' = 2^ =4 
possible combinations of pairs of labels (/;,/;) G {/o,/i} x {/o,/i} = {0, 1} x {0, 1}, 
and obtain: 

ilo,lo) = (0,0) ^Vij{lo,lo,di,dj) 
ilo,h) = (0,1) ^Vij{lo,h,di,dj) 
ih,lo) = (1,0) ^Vij{lulo,di,dj) 
{luh) = ihl)^Vij{luludi,dj) 



(0©0)Wy: 

(0©l)wy: 

(l©0)w;j: 
(1©!)^; , =0 



(36) 
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which is written in matrix form 5*17/^ = r/^ as 



1 
1 



( Vij{loJo,di,dj) \ 
Vij{k,h,di,dj) 
Vij{hJo,di,dj) 
\ Vij{h,h,di,dj) J 



(37) 



where Y^^^ is the unknown n-Hnk weight Wij connecting Vi to neighboring vj. As before, 
subscript 1 indicates b = I. The first and fourth equations capture the condition that in 
order to guarantee the same label for neighboring vertices then the edge weight should 
be infinite (O/Vij) and, hence, never severed. Solving for Wij using pseudoinverse gives 



{Vij{lo , h , di ,dj)^ Vij{h , /o , di ,dj)) since S 



(0,0.5,0.5,0),/.^. w.-^ is 



equal to the average between the interaction penalties of the two cases when the labels 
are different. 

For = 2, (18) simplifies to 



Vij{li,lj,di,dj) = {kl®ljl)'^vn,vji^ 

{hi © Ijl) Wv,i,v,-2 + (fa © Ijl) Wv,2,v,-i + (fa © Ijl) Wv,2,v,-2 



(38) 



We can now substitute all possible 2^2^ = 2^^ = 16 combinations of the pairs of in- 
teracting labels {li.lj) e {k.h.h.h} x {k,h,h,h}, or equivalently, ((/O2, (/j)2) ^ 
{00,01, 10, 11} X {00,01, 10, 11}. Here are some examples. 



(/o,/o) = (00,00): 
(/o,/i) = (00,01) 
(/o,/2) = (00,10) 

(/0,/3) = (00,11): 

(/i,/o) = (01,00) 

(/2,/l) = (10,01): 

(/3,/3) = (11,11) 



Vij{k,k,di,dj) = Owy.^^yj^ +0>^va,v^-2 +0>^v,-2,vyi +0>^v,-2,vy2 
Vij{lo,h,di,dj) = Owy.^^yj^ + ^^vn,vj2^^^vi2,vji + l>^v/2,vy2 

Vij{k,l3,di,dj) = lWv.^,vy^ + ^^Viuvj2 + ^"^va.Vji + l>^v,-2,vy2 

Vij{h,k,di,dj) = Owy.^^yj^ ^^^vn,vj2 + ^^Vi2,vji + ^^Vi2,vj2 
VijihJudi.dj) = IWy.^^yj^ ^^^vn,vj2^^^vi2,vji + l^v,-2,v;2 

yij{h,h,di,dj) = Owy.^^yj^ +0wv.^,v^.2 +0wv.2,v^.i +0wv.2,v^.2 



(39) 

Writing all the 16 equations, we obtain the linear system of equations in matrix format 

is the 4 X 1 vector of 



ViuVj2J^Vi2,Vji J '''Vi2,Vj2 

T2 is a 16 X 1 vector whose entries are the different 



as S2Y2' = T^^ where Y^^ = (wy.^^y^^ 
unknown n-link edge weights 

possible interaction penalties {{Vij{liJj^di^dj))^^Q)^j^Q, and 5*2 is a 16 x 4 matrix with 
or 1 entires resulting from 



^2 = 



> as follows 

/OOl 1001 1 1 1001 lOOV 
0101010110101010 
0011110000111100 
\0101101001011010/ 



(40) 



In general, for any b, we have the following linear system of equations 



(41) 
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where F^^ is the xl vector of unknown n-Hnk edge weights, is 2^^ x matrix of 
Os and Is, and 7]J^ is a 2^^ x 1 vector of interaction penalties, i.e. 



( 



^1 2^-1 



\ 



( 



mbj2 



\ 



( 



\ ^^2^- 1,2^-1 / V ^ibJb I 

where Sm^n is a row in 5"^ and is given by 



VijikJo^di.dj) \ 
Vij{k,h,di,dj) 

Vij{lo,l2b_i,di,dj) 
Vij{hJo,di,dj) 
Vij{h,h,di,dj) 

Vij{l\,l2b_i,di,dj) 



^Jo^di^dj) 
\M^di,dj) 



(42) 



\ Vij{l2b_i,l2b_i,di,dj) I 



Iml ® ^nl 9 ^m2 © ^^2 5 ' ' 
^mb ® -I ^mb ® ^n2 •> ' ' ' 



1 1mb ® ^nb) 



(43) 



We now solve the linear system of equations in (41) to find the optimal, in a LS 
sense, n-links edge weights F^^ related to a pair of vertices Vi and using 



(44) 



Similar to what we noted for (34), Sij and S'^ are pre-computed off-line only once 
for each b value. 

Solving (44) for every pair of neighboring vertices Vi and vj, we obtain the weights 
of all edges in £2^^^, and solving (34) for every vertex v^, we obtain the weights of 



all edges in E2 U£^2 i.e. Wij^mnyeij,mn ^ ^2 are now known. We now cal- 
culate the minimal s — t cut of G2 to obtain the binary labeling of every vertex in 

^2 = {{^i7}l=i}5=i- Finally, every sequence of b binary labels (v/j)^^^ is decoded 
to a decimal label // G = {/o,/i, ...,4-i},Vv; G V, i.e. the solution to the original 
multi-label MRF problem. 



2.5 Gray encoding for extra labels 

In cases when b bits are needed to represent k labels (according to (10)) but with k<2^, 
e.g. when /? = 2 and 2^ = 4 but ^ = 3, or when Z? = 3 and 2^ = 8 but ^ = 5,6, or 7 (but 
not 8), we have what we call extra or unused labels: The ^th label In-i is extra iff k < 
n <2^ (remember that = {/o,/i, ...,4-i}), ^.g. the 4^^ label is an extra label when 
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^ = 3, the 6^^, 7^^ and 8^^ labels are extra labels when k = 5, etc. . Following an ^ — ^ cut, 
we can, in general, end up with these extra labels, and must, therefore, replace or merge 
them with any of the non-extra or used labels: The mth label lm-\ is a non-extra label 
iff 2^~^ <m<k.lf label In is an extra label to be replaced with label /^, then, we must 
replace Di{ln) with Di{ljn) when substituting (as in (25)) all possible label values in 
(17). Similarly, we must replace V,7(/„,/^-, hyVij{lm,lj,di,dj) SindVij{liJri,di,dj) 
by Vij{liJrn^di^dj) when substituting (as in (39)) all possible combinations of the pairs 
of interacting labels in (18). Rather than merging arbitrary labels, we adopt a Gray 
encoding scheme. That is, we minimize the Hamming distance (HD)^ between the 
binary codes of a pair of merged labels. For example, we favor merging label 0001 
with 1001 (HD=1) over merging 0001 with 0010 (HD=2). To implement this, we first 
note that the most significant bit of the binary code of an extra label will always be 
1 (if it isn't, then we'll be using more bits than needed). Then, each extra label is 
merged with the non-extra label whose binary code is identical to that of the extra label 
except for having as its most significant bit. Thus guaranteeing HD=1 for all pairs of 
merged labels. For example, 100 will be merged with 000, 111 with Oil, etc., or more 
generally {Qi = (1, fc, • • • , h)2 is merged with {1^)2 = (0, fc, • • • , h)!- 

3 Results 

3.1 LS error and rank deficiency analysis 

The approximation error for general LS problems is a well studied topic [23, 20, ^]. In 
our method, to estimate the edge weights of t- links and intra- links in (44), a system of 
2^ linear equations are solved for lb + (2) unknowns, compared to 2^^ equations and 
unknowns when estimating the n-links edge weights in (44). Table 1 summarizes 
the number of equations, unknowns, and the ranks of and Sij of for different values 
of b. Note that the only full-rank case is Ai (i.e. binary segmentation). is underde- 
termined for /? = 2, 3 and overdetermined for b>4. All cases of are rank deficient 
and overdetermined. 

We present, in Figure 5, empirical results of LS error et> when solving for the edge 
weights of t-links and intra-links (c.f. Section 2.4.1, (30), (34)), and, in Figure 6, the 
error Ct of n-links (c.f. Section 2.4.2 and (41), (44)). e^ and Ct are given by 

= 14-41/141 = |(/-A^+)4|/|4| (45) 
et = \Tl/-fU\/\Tl/\ = \{I-S,St)Tl^\/\Tl^\ (46) 

where / is the identity matrix and | . | is the /^-norm. Note how the error in Figure 5 
starts at exactly zero for binary segmentation (b = 1), as expected. With increasing 
number of labels, the average error increases with an (empirical) upper bound of 0.5, 
whereas the error variance decreases. In Figure 6, the error is non-zero even for binary 
segmentation (Section 2.4.2) and it converges to 0.5 as the number of labels increases. 
The plots are the result of a Monte Carlo simulation of 500 random realizations of the 

^The Hamming distance between two strings of equal length (two binary codes in our case) is the number 
of positions for which the corresponding symbols (or bits) is different. 
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in (30) 


^^in (41) 
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1 
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64 
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4 


16 


14 


11 


8 


5 


256 


16 


16 


4 


4 


5 


32 


20 


16 


10 


6 


1024 


25 


25 


5 


5 


6 


64 


27 


22 


12 


7 


4096 


36 


36 


6 


6 


7 


128 


35 


29 


14 


8 


16384 


49 


49 


7 


7 


8 


256 


44 


37 


16 


9 


65536 


64 


64 


8 


8 



Table 1: Properties of the system of linear equations. For different numbers of bits b, 
the table lists the number of equations number of unknowns u, and ranks r of the 
matrix of coefficients {c.f. (30) in section 2.4.1) and Sij (c.f. (41) in section 2.4.2). 
Uo and ro reflect the case when, for A^, intra-links are not used (i.e. £2^^^ = in (1 1)) 

n f 

and, for St,, only sparse n-links are used (i.e. E2 = in (13)). 

constant vectors and T^-^ (the right hand side of (30) and (41)) for each number of 
labels. 
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Figure 5: LS error in estimating the t-links and intra-links edge weights for increas- 
ing number of labels. 



3.2 Effect of LS error in edge weights on s — t cut 

Our inability to model the multi-way cut exactly as an ^ — ^ cut is captured by the LS 
error in estimating the edge weights. This error in edge weights results in error in 
the ^ — ^ cut (or error in the binary labeling), which is then decoded into a suboptimal 
solution to the multi-label problem. In Figure 7 and Figure 8, we quantify the error in 
the cut cost and the labeling accuracy due to edge weight errors for different numbers 
of labels. To this end, we create a graph G with a proper topology (i.e. reflecting the 
4-connectedness of 2D image pixels) and random edge weights (sampled uniformly 
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Sy=t LSE (rep:500, intra:1 , full:1 , b type:1 , z type:1 , # labels: [2:256]) 




number of labels 



Figure 6: LS error et in estimating the n-links for increasing number of labels. 



from [0, 1]). We then construct G^se^ a noisy version of G, by introducing errors in the 
edge weights modeled after the LS error {i.e. the norm of the error is dependent on the 
number of labels, according to the error analysis results in Figure 5 and Figure 6). The 
cut cost error A|C| is calculated using 

A|C| = ||C|-|C«£||/|C| (47) 

where |C| = L^.^.^c^u ^^^^ ^ \Clse \ is the cut cost of G^se- The labeling 
accuracy ACC is calculated using 

ACC={TP^TN)/\V\ (48) 

where TP + TN gives the total number of correctly labelled, as object or background, 
vertices {i.e. true positive and true negatives), and |V| is the total number of vertices 
in the graph, which is equal to the number of pixels in the image times the number of 
bits needed to encode the different labels. The plots are the results of a Monte Carlo 
simulation of 10 realizations of G and Glse representing a 25 x 25-pixel image, with 
the number of labels ranging from 2 to 256. Note that LSE errors were introduced to 
edge weights even for the binary case (Section 2.4.2), which explains why ACC < 1 
and A|C| > for /? = L We obtained an average (over all numbers of labels and all 
noise reaHzations) A|C| = 0.094 and ACC = 0.864 with standard deviations 0.0009 and 
0.0054, respectively. Note also the encouraging behavior where A|C| and ACC remain 
almost constant even for increasing number of labels. Increasing the number of pixels 
by 16 times to 100 x 100 and doubling the number of realizations to 20, the reported 
values, for 128 labels, remained almost constant with an average A|C| = 0.0926 and 
ACC = 0.863, with standard deviation 0.00018 and 0.0011, respectively. Note, how- 
ever, that in image segmentation scenario the image intensities will be corrupted by 
noise, in addition to the errors introduced by the LS error. 

It is important to emphasize that if we naively corrupt the edge weights of G with 
random error rather than LS error, we will obtain different ACC and A|C| values with 
increasing number of labels. To show this, we create the graph G as before, but now 
the noisy version of G is created by simply adding noise sampled uniformly from 
[0, noise level] to the edge weights of G. The results are given in Figure 9 and Fig- 
ure 10. 
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Figure 7: Cut cost error A|C| for increasing number of labels when the error in edge 
weights is induced by the LS error. 
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Figure 8: Labeling accuracy ACC for increasing number of labels when the error in 
edge weights is modeled after the LS error. 
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Figure 9: Cut cost error A|C| for increasing number of labels as we corrupt the edge 
weights with random (not LS) error. 
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Figure 10: Labeling accuracy ACC for increasing number of labels as we corrupt the 
edge weights with random (not LS) error. 



3.3 Image segmentation results 

We evaluate our algorithm's segmentation results on synthetic images by calculat- 
ing DSC (Figure 11). We tested increasing levels of additive white Gaussian noise 
(AWGN), with 9 standard deviation levels a e {0,0.05,0.10, •• • ,0.40}, corrupting 
images (I{x,y) : i^A]) of ellipses with random orientations, random lengths 

of major/minor axes', and varying pixel intensities. We tested 15 numbers of labels 
/: = {2,3,4, ••• , 16} (^ — 1 ellipses plus the background label). Sample images are 
shown in Figure 12. We examined 11 different values for A = {0,0.1,0.2, • • • ,1} (see 
(1)). We used the Pott's label interaction penalty (V^^- = with a spatially varying 

Gaussian image intensity penalty (Vfj = exp{—l3 {di — dj)^)) with j3 = 1 (see Section 
1.2.). 50% of the pixels of each region (or label) / of the noisy image (mimicking 
seeding) were used to learn a Gaussian probability density function pi{x) ~ N{ili^Gi) 
of the image intensity x for that region. The data penalty Di{li) for each pixel / with 
intensity Xi was calculated as {pi{li) — pi{xi)) / pi{il). We ran 10 realization for each 
test case, i.e. a total of 14,850 segmentations (9 x 15 x 11 x 10). 

From Figure 1 1 , we note high DSC for small number of labels and small noise 
levels and, as expected, gradually decreasing DSC results with increasing labels and 
noise. Note, for example, the topmost blue curve for o = 0.05 shows almost perfect 
segmentation {DSC =1), whereas the second from top green curve for cr = 0.1, shows 
that DSC drops below 1 as the number of labels is 9 or higher. For cr = 0.15 this drop 
occurs earlier, at 5 labels. 

We also present qualitative segmentation results on synthetic data (Figure 12) and 
on magnetic resonance brain images (Figure 13) from Brain Web [11]. 



20 




2 4 6 8 10 12 14 16 

number of labels 



Figure 11: Dice similarity coefficient DSC between ground truth segmentation and 
our method's segmentation versus increasing number of labels and for noise levels 
(different colors). 



fc=3 A:=5 A: = 10 




Figure 12: Sample qualitative results on images of ellipses with k labels — 1 ellipses 
plus background) and noise level a. (top row) sample intensity images; (remaining 
rows) labeling results. 
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4 Conclusions 



Multi-label MRF optimization is a challenging problem especially with non-trivial 
label-interaction priors. Algorithms that address these challenges have numerous im- 
plications for a variety of computer vision applications (e.g. segmentation, stereo re- 
construction, etc.). We presented a novel approach to examining multi-label MRF. 
Rather than labeling a single vertex with one of k labels, each vertex is first replaced 
hy b = ceil{log2{k)) new vertices, and every new vertex is binary-labelled. The binary 
labeling of the new vertices encodes the original k labels, effectively approximating 
the multi-label problem with a globally and non-iteratively solvable s — t cut. With 
b vertices replacing each original vertex, a new graph topology emerges, whose edge 
weights are approximated using a novel LS error approach, derived from a system of 
linear equation capturing the original multi-label MRF energy without any restrictions 
on the interaction priors. Offline pre-computation of the pseudo-inverse used in LS is 
performed only once and used for different graphs and vertices. We quantitatively eval- 
uated different properties of the proposed approximation method and demonstrated the 
application of our approach to image segmentation (with qualitative and quantitative 
results on synthetic and brain images). 

Future research is focused on addressing some of the deficiencies of the presented 
work as well as exploring ideas for improvements. The segmentation results will likely 
be improved with proper optimization of the free parameters (Section 1.2) (e.g. the 
choice of the label-interaction prior V^/ (/; ,/j), the spatially adaptive data interaction 
Vfj{di,dj), their associated parameters, T and j3, and A that balances the data and 
prior terms). Following such parameter optimization, it will be essential to compare 
with other approaches for multi-label segmentation methods. For segmentation of im- 
ages that are more complex than intensity images, e.g. color images, diffusion tensor 
magnetic resonance images, dynamic positron tomography images, etc., the data in- 
teraction term must be replaced to better capture distances between vector and tensor 
pixels rather than scalar pixels. We plan to evaluate the performance of the method on 
computer vision problems that necessitates non-metric label interaction. 

We noted A|C| and ACC remaining almost constant with increasing number of 
labels when corrupting the graphs with LS error rather than random noise (Figure 7 and 
Figure 8). We speculate the reason is that the number of unknowns does not increase as 
fast as the number of equations, but this remains to be further investigated and formally 
explored. 
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